Final  Report  AFOSR  FA9550-06-1-0408 


Studies  of  single  biomolecules,  DNA  conformational  dynamics, 

and  protein  binding 

Andreas  Hanke 

Department  of  Physics  and  Astronomy, 

University  of  Texas,  80  Fort  Brown,  Brownsville 

Abstract 

While  the  Watson-Crick  double-strand  is  the  thermodynamically  stable  state  of  DNA  in  a  wide 
range  of  temperature  and  salt  conditions  even  at  physiological  conditions  local  denaturation  bub¬ 
bles  may  open  up  spontaneously  due  to  thermal  activation.  By  rising  the  ambient  temperature, 
titration,  or  by  external  forces  in  single  molecule  setups  bubbles  proliferate  until  full  denaturation 
of  the  DNA.  Based  on  the  Poland-Scheraga  model  we  investigate  both  the  equilibrium  transition 
of  DNA  denaturation  and  the  dynamics  of  the  denaturation  bubbles  with  respect  to  recent  single 
DNA  chain  experiments  for  situations  below,  at,  and  above  the  denaturation  transition.  We  also 
propose  a  new  single  molecule  setup  based  on  DNA  constructs  with  two  bubble  zones  to  measure 
the  bubble  coalescence  and  extract  the  physical  parameters  relevant  to  DNA  breathing.  Finally  we 
consider  the  interplay  between  denaturation  bubbles  and  selectively  single  stranded  DNA  binding 
proteins. 
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FIG.  1:  Ladder  structure  of  DNA  showing  the  Watson-Crick  bonding  of  the  bases  A,  T,  G,  and  C 
which  are  suspended  by  a  sugar-phosphate  backbone.  Each  phosphate  carries  a  negative  charge. 
The  longitudinal  distance  between  adjacent  base  pairs  is  3.43  A  while  approximately  10.5  base  pairs 
are  needed  to  form  a  complete  helical  turn.  Under  normal  salt  conditions  the  persistence  length 
of  double  stranded  DNA  is  approximately  50  nm,  the  hard  core  diameter  is  approximately  2  nm. 
Locally  (i.e.,  for  lengths  shorter  than  the  persistence  length),  DNA  appears  thin  and  stiff  while  on 
longer  scales  it  can  be  perceived  as  a  flexible  polymer.  The  length  of  a  single  DNA  molecule  varies 
from  several  /mi  of  viral  DNA  over  several  mm  in  bacteria  up  to  many  cm  in  eukaryotic  cells. 

I.  INTRODUCTION 

Deoxyribonucleic  acid  (DNA)  is  the  molecule  of  life,  encoding  the  complete  genetic  infor¬ 
mation  of  an  entire  organism.  This  information  is  kept  in  terms  of  the  four  letter  alphabet 
comprised  by  Adenine,  Guanine,  Cytosine,  and  Thymine.  The  genetic  code  is  stabilised  by 
base  pairing  through  hydrogen  bonding  that  creates  two  complementary  strands  subject  to 
the  key-lock  principle.  This  way  it  is  made  sure  that  exclusively  AT  and  GC  nucleotides 
pair.  Within  this  ladder  structure  (Fig.  1)  the  bases  and  thus  the  genetic  code  are  protected 
against  unwanted  action  of  chemicals  and  proteins.  The  three-dimensional  structure  of  DNA 
is  the  famed  Watson-Crick  double-helix,  the  equilibrium  structure  of  DNA  within  a  broad 
range  of  salt  and  temperature  conditions.  Sufficiently  close  to  physiological  conditions  the 
typical  conformation  of  double-stranded  DNA  is  the  B  form  with  a  pitch  of  3.4  A  between 
successive  base  pairs  and  approximately  10.5  base  pairs  needed  to  form  one  complete  turn  of 
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FIG.  2:  Thermal  denaturation  of  double  stranded  DNA:  Fraction  9^  of  double-helical  domains 
within  the  DNA  as  a  function  of  temperature.  Schematic  representation  of  9h(T),  showing  the 
increased  formation  of  bubbles  and  unzipping  from  the  ends,  until  full  denaturation  has  been 
reached.  Note  that  bacterial  DNA  is  predominantly  circular  so  that  no  end  effects  occur.  Also 
viral  DNA  circularises  once  injected  into  a  host  cell. 

the  helix.  This  thermodynamic  stability  apart  from  hydrogen-bonding  between  paired  bases 
is  mainly  effected  by  base-stacking  between  nearest  neighbour  pairs  of  base  pairs  [1-6]. 

By  temperature  increase  or  variation  of  the  pH  (titration  with  acid  or  alkali)  double- 
stranded  DNA  progressively  denatures.  The  comparatively  stiff  DNA  double-strand  (persis¬ 
tence  length  circa  50  nm)  is  thereby  interrupted  by  emerging  zones  of  flexible  single-strand 
(persistence  length  circa  1  to  a  few  nm).  These  so-called  DNA  bubbles  then  grow  and  merge 
until  the  double-strand  is  fully  molten  (Fig.  2).  This  is  the  helix-coil  transition.  The  melt¬ 
ing  temperature  Tm  is  experimentally  defined  as  the  temperature  at  which  half  of  the  DNA 
molecule  has  undergone  denaturation  [3,  7,  8].  Typically,  the  denaturation  starts  in  regions 
rich  in  the  weaker  AT  base-pairs,  and  subsequently  moves  to  zones  of  increasing  GC  content. 
The  occurrence  of  zones  of  different  stability  within  the  genome  was  shown  to  be  relevant 
when  separating  coding  from  non-coding  regions  [9,  10]  and  is  believed  to  be  related  to  DNA 
function,  for  instance  the  occurrence  of  weak  regions  ( e.g .  the  TATA  motif)  at  transcription 
initiation  points. 

Albeit  rare,  already  at  room  temperature  thermal  fluctuations  cause  opening  events  of 
small  intermittent  denaturation  bubbles  [11].  The  size  of  these  bubbles  fluctuates  by  step¬ 
wise  zipping  and  unzipping  of  the  base  pairs  at  the  zipper  forks  where  the  bubble  con- 
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nects  to  the  intact  DNA  double-strand  (bubble  breathing).  Initiation  of  a  bubble  in  a 
stretch  of  double-strand  requires  the  crossing  of  a  free  energy  barrier  Fs  of  approximately  8 
kcal/mol  (some  10  ksT  at  physiological  temperature)  corresponding  to  a  Boltzmann  factor 
cr0  =  exp (—Fg/ksT)  ~  10-5  '_3.  <j0  is  often  referred  to  as  the  cooperativity  factor.  Once 
formed  below  the  melting  temperature  Tm  a  bubble  will  eventually  zip  close.  Above  Tm ,  a 
bubble  will  preferentially  stay  open  and,  if  unconstrained,  grow  in  size  until  it  merges  with 
other  denaturation  bubbles,  eventually  leading  to  full  denaturation.  Constraints  against 
such  full  unzipping  could,  for  instance,  be  the  build-up  of  twist  in  smaller  DNA-rings  [12], 
the  highly  positively  supercoilcd  state  (linking  excess)  in  the  DNA  of  extremophilc  bacteria 
existing  at  high  temperatures  in  deepsea  vents  [13],  or  the  chemical  connection  of  the  two 
strands  by  short  bulge-loops,  compare  Ref.  [14],  In  heteropolymer  DNA  mechanical  stretch¬ 
ing  experiments  show  that  even  at  the  end  of  the  overstretching  transition  and  beyond 
the  two  strands  do  not  separate  completely  [15-17]  but  are  still  held  together  by  isolated 
GC-rich  regions  along  the  chain  with  average  distance  of  a  few  hundreds  of  base  pairs  [16] . 
These  GC-rich  regions  break  only  at  a  much  larger  force  than  the  melting  force  Fm  of  the 
overstretching  plateau. 

Biologically  the  physical  conformations  of  DNA  molecules  are  recognised  to  be  of  inalien¬ 
able  relevance  for  its  function,  see,  for  instance,  the  review  [18]  and  references  therein.  In 
particular,  the  existence  of  intermittent  though  infrequent  bubble  domains  is  important  as 
the  opening  up  of  the  Watson-Crick  base  pairs  by  breaking  of  the  hydrogen  bonds  between 
complementary  bases  disrupts  the  helical  stack.  The  associated  flipping  out  of  the  ordered 
stack  of  the  unpaired  bases  allows  the  binding  of  specific  chemicals  or  proteins,  that  other¬ 
wise  would  not  be  able  to  access  the  reactive  sites  of  the  bases  [3,  6,  7,  11].  Indeed  there 
exists  a  competition  of  time  scales  between  the  survival  of  DNA-bubbles  and  the  binding 
kinetics  of  selectively  single-stranded  DNA  binding  proteins  [19-21],  An  important  aspect 
to  the  biological  function  of  DNA  it  is  believed  that  DNA-breathing  assists  transcription  ini¬ 
tiation  [22-25],  see  below.  Altogether  it  appears  fair  to  say  that  the  quantitative  knowledge 
of  the  energetics  of  the  denaturation  as  well  as  the  dynamics  of  bubbles  is  imperative  to 
a  better  understanding  of  genomic  biochemical  processes.  Additionally  DNA  denaturation 
is  a  fine  example  of  a  well  defined  and  chemically  stable  system  whose  physical  properties 
can  be  probed  in  detail  on  the  level  of  single  molecules.  DNA  is  therefore  studied  from 
both  viewpoints  biological  physics  with  respect  to  DNA’s  role  in  biochemical  processes  and 
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statistical  physics  for  which  DNA  provides  an  ideal  system  to  study  quantitatively  polymer 
models. 

In  what  follows  we  will  base  our  analysis  on  the  Poland- Scheraga  free  energy  model 
treating  the  DNA  molecule  as  an  Ising-type  system  of  a  sequence  of  ‘spins’  with  open  or 
closed  states  plus  a  non-local  term  that  takes  care  of  polymeric  effects  within  denaturation 
bubbles  made  up  of  highly  flexible  DNA  single-strand.  A  prominent  alternative  description 
of  DNA  denaturation  and  breathing  is  the  Peyrard  Bishop  Dauxois  (PBD)  model  [26,  27] 
based  on  the  set  of  Langevin  equations  [28] 

m d2yn  _  dV (yn)  _  dW(yn+i,yn)  _  dW(yn,yn- i) 

dt 2  dyn  dyn  dyn 

~ml  ^r+fn(t)-  (1) 

Here,  V(yn)  =  Dn  [exp(— anyn)  —  l]2  is  a  Morse  potential  for  the  hydrogen  bond¬ 
ing,  Dn  and  an  assuming  two  different  values  for  AT  and  GC  bps;  W (y,  yr)  = 
|  [1  +  pexp{—  P(y  +  y')}]  (y  —  y')2  is  a  nonlinear  potential  to  include  bp-bp  stacking  in¬ 
teractions  between  adjacent  bps  y  and  y' .  The  parameters  k,  p,  (3,  7,  and  m  are  invariant 
of  the  sequence.  The  equation  is  driven  by  the  thermal  noise  £n(t).  Usually,  the  stochastic 
equations  (1)  are  integrated  numerically  [28].  Due  to  its  formulation  in  terms  of  a  set  of 
Langevin  equations,  the  DPB  model  is  very  appealing,  and  it  is  a  useful  model  to  study  some 
generic  features  of  DNA  denaturation.  Its  disadvantage  is  that  somewhat  arbitrary  values 
for  the  model  parameters  need  to  be  chosen  while  (apart  from  the  characteristic  time  scale) 
all  parameters  in  the  Poland- Scheraga  model  are  available  from  a  large  body  of  experiments. 

We  first  address  the  denaturation  transition  at  equilibrium  both  in  absence  and  presence 
of  an  external  stretching  force.  Subsequently  we  will  present  two  model  approaches  to  the 
breathing  dynamics  of  a  single  denaturation  bubble.  In  Section  4  we  discuss  the  coalescence 
dynamics  of  two  DNA  bubbles.  Finally,  in  Section  5  we  address  the  coupling  of  the  breathing 
dynamics  of  a  DNA  bubble  with  the  binding/unbinding  of  proteins  that  specifically  bind  to 
single-stranded  DNA. 


5 


II.  DNA  DENATURATION  IN  PRESENCE  OF  A  MODEST  STRETCHING 
FORCE 

A  convenient  method  to  treat  the  denaturation  transition  is  to  consider  the  chain  in  the 
grand  canonical  ensemble  in  which  the  total  number  N  of  bps  and  the  end-to-end  vector  L 
fluctuate.  The  partition  function  in  d  =  3  of  the  DNA  chain  under  external  forcing  with 
force  F  in  x  direction  becomes  [29] 

OO  „ 

Z{z,F)  =  Y,j  d3LZcaa(N,L)zN  eMPFLx)  (2) 

N= 1  d 

with  (3  =  1  /(ksT).  Zcan(N,  L)  is  the  canonical  partition  function  of  a  chain  of  N  bps  with 
fixed  end-to-end  vector  L,  z  is  the  fugacity,  and  Lx  the  ^-component  of  L  (Fig.  3).  Assuming 
that  bound  segments  and  bubbles  are  independent,  Z  factories: 

{°°  1  kP  B 

[BQ]n  i  BQe  =  De  +  e  (3) 

™=o  J  e 

The  alternating  sequence  of  bound  segments  and  bubbles  with  weights  B  and  D  in  Eq.  (3)  is 
complemented  by  the  weight  kle  of  an  open  end  unit  at  both  ends  of  the  chain.  We  assume 
that  only  one  strand  of  the  end  unit  is  bound  to  the,  say,  magnetic  bead,  while  the  other 
strand  is  moving  freely.  Thus  the  first  term  on  the  right  hand  side  of  Eq.  (3)  denotes  the  two 
unbound  single  strands  of  completely  denatured  DNA;  here  we  assume  that  one  of  the  two 
strands  is  still  attached  between  origin  O  and  end  point  L,  being  subject  to  the  stretching 
force  F . 

A  bound  segment  with  k  =  1,2, .. .  bps  is  modelled  as  a  rigid  rod  of  length  ak  where 
a  =  0.34  nm  is  the  length  of  a  bound  bp  in  B-DNA  [30].  Here  we  assume  a  homopolymer 
with  binding  energy  E0  <  0  per  base  pair.  However,  we  assume  perfect  matching  throughout 
the  transition  such  that  in  a  denaturation  bubble  both  single  stranded  arches  carry  equal 
length.  This  assumption  is  in  line  with  above  remark  that  due  to  stable  GC-rich  islands  in 
the  structure  during  a  force-induced  denaturation  the  sequence  of  separated  denaturation 
bubbles  and  intact  double-strand  persists  to  much  larger  forces  than  the  melting  force  Fm 
of  the  overstretching  plateau  [16].  The  statistical  weight  of  a  segment  with  fixed  number 
k  and  fixed  orientation  is  then  tok  with  u  =  exp (/3e)  and  £  =  —  E0  >  0.  Assuming  that  k 
fluctuates  with  fixed  fugacity  z,  and  rotates  around  one  end  while  subject  to  the  force  F 
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FIG.  3:  Stretched  DNA  in  the  PS  model  with  bound  segments  B  and  denatured  loops  fh  The 
DNA  is  attached  between  O  and  L  and  subject  to  the  stretching  force  F  in  x-direction.  Perfect 
matching  in  heterogeneous  DNA  requires  both  arches  of  a  loop  to  have  equal  length  l. 

(Fig.  3),  the  statistical  weight  of  the  segment  for  fixed  z  and  F  becomes  [29] 

1  /I  —  ujze~y\ 

=  T—^y^PFa.  (4) 

At  F  —  0,  B(z,  to,  0)  =  ujz/ (1  —  a jz)  as  found  previously  for  free  DNA  [31]. 

A  denatured  loop  is  considered  as  a  closed  random  walk  with  2£  monomers,  corresponding 
to  i  broken  bps.  The  loop  starts  at  O  and  visits  the  point  r  after  d  monomers  (Fig.  3).  The 
number  of  configurations  of  such  a  loop  becomes 

fl(d,r)  =  C0(2d)pt(r)  (5) 

where  Co(2d)  counts  the  configurations  of  a  loop  of  length  2d  starting  at  O  and  pe(r)  is  the 
probability  that  the  loop  visits  r  after  d  monomers.  For  an  ideal  random  walk  in  d  =  3, 
C0(2d)  ~  pud~3^2  (/j  is  the  connectivity  constant[89])  and  pi(r)  ~  7W3  exp[— X(r/7V)2]  where 
A  >  0,  r  =  |  r | ,  and  7 Z  =  bd1/2  is  the  scaling  length  of  the  walk.  The  coefficient  b  is 
proportional  to  the  persistence  length.  Thus,  Q(£,  r)  ~  se£~3  exp[— A(r/7^.)2]  where  s  =  p2. 
We  assume  that  r  moves  freely  and  is  subject  to  the  force  F  in  the  positive  x-direction.  The 
weight  of  an  ideal  random  loop  for  fixed  l  and  F  is  given  by  the  Gaussian  integral 

Q(d,  F)  —  J  d3r  n(£,  r)  e0Fx  =  Asl£~c  exp (ay2d)  (6) 

where  A  is  an  amplitude  proportional  to  the  cooperativity  factor  cr0,  c  =  3/2,  and  a  = 
b2/(4\a2). 
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Generally,  the  loop  free  energy  in  both  presence  and  absence  of  the  external  forcing  is  of 
the  power-law  form 

n  ~  rc.  (7) 

For  free  DNA  it  was  found  that  the  nature  of  the  denaturation  transition  is  determined  by  the 
value  of  the  critical  exponent  c:  for  c  <  1  there  is  no  phase  transition  in  the  thermodynamic 
sense;  for  1  <  c  <  2  the  transition  is  second  order,  and  for  c  >  2  it  is  first  order  [31-33]. 
One  finds  c  =  3/2<2if  the  loops  are  ideal  random  walks.  Self- avoiding  interactions  within 
a  loop  modify  this  value  to  c  =  3v  =  1.76  with  v  =  0.588  in  d  —  3.  In  both  cases  the 
transition  is  second  order.  Including  self-avoiding  interactions  between  denatured  loops  and 
the  rest  of  the  chain  was  found  to  produce  c  =  2.12  >  2,  driving  the  transition  to  first  order 
[9,  31,  34],  These  results  suggest  that  the  inclusion  of  self-avoiding  interactions  generally 
shifts  the  loop  exponent  c  to  larger  values,  possibly  effecting  a  change  of  the  transition  from 
second  to  first  order. 

Using  scaling  arguments  in  the  presence  of  self-avoiding  interactions  within  a  loop  we 
find  a  modified  expression  for  the  statistical  weight  [29] 

Q(£,  F )  =  Ase£~ V exp  ( ay1/u£ )  (8) 

for  k  =  (3bF(.v  — >  oo  and  with  the  new  loop  exponent  in  d  —  3, 

c  =  Au  -  1/2  =  1.85.  (9) 

Thus,  in  the  presence  of  self-avoiding  interactions  within  a  denatured  loop  and  F  >  0  the 
transition  remains  second  order,  but  moves  closer  to  first  order  compared  to  free  DNA  (with 
c  =  3is  =  1.76  obtained  within  the  same  approach).  In  the  Gaussian  limit  the  same  result 
obtains  as  in  the  absence  of  the  force  corresponding  to  the  ideal  Hookean  chain  behaviour 
of  a  phantom  chain. 

Within  this  formalism  it  is  also  possible  to  obtain  the  force-extension  behaviour  of  the 
chain  as  well  as  the  temperature- force  phase  diagram,  see  Fig.  4.  The  shape  of  the  transition 
line  fm(t)  depends  on  A,  a,  and  s.  Fig.  4a  shows  fm(t )  for  A  —  1,  a  —  1,  and  s  =  5  for 
the  case  that  denatured  loops  are  ideal  random  walks  (6  =  0,  u  =  1/2).  The  transition 
line  for  a  more  realistic  value  A  <C  1  is  also  shown  (here  A  =  0.01).  The  line  fm(t) 
separates  a  finite  region  of  bound  states  from  an  infinite  region  of  denatured  states.  The 
point  (f0,  /  =  0)  with  t0  =  ^m.(/  =  0)  corresponds  to  the  traditional  melting  transition  for 


FIG.  4:  Transition  lines  fm  =  Fma/e  as  function  of  t  =  ksT/e  for  a  =  1,  s  =  5  for  denatured  loops 
modelled  as  (a)  ideal  random  walks  and  (b)  self-avoiding  walks.  Note  the  reentrant  behaviour  at 
lower  temperatures  where  the  required  melting  force  decreases. 

free  DNA  (F  —  0).  The  line  fm(t)  for  A  =  1  contains  a  region  in  which  fm(t)  decreases  with 
t,  such  that  increased  stretching  forces  /  lower  the  melting  temperature  corresponding 

to  force-induced  destabilisation  of  DNA  [30].  Interestingly,  for  A  =  0.01  the  line  fm(t)  is 
not  single-valued.  Moreover,  fm(t)  vanishes  for  both  t  — >  to  (as  \t  —  to|i/2)  and  t  0  (as 
cr_ 1/2f1/2) .  This  ’’reentrant  behaviour”  [36]  means  that  for  given  0  <  /o  <  /max,  where  /max 
is  the  maximum  of  the  chain  does  not  only  denature  at  a  large  £+(/o)  but  also  at  a 

small  t-(/0).  This  behaviour  can  be  traced  back  to  a  balance  of  the  terms  {[5 Fa)2  and  j3Fa 
in  zm(F)  =  exp (—ay2)/s  and  Eq.  (4),  respectively.  For  ( f3Fa )2  -C  /3Fa,  i.e.,  ksT  Fa,  the 
melting  transition  at  (/0)  is  mainly  driven  by  the  entropy  gain  on  creation  of  fluctuating 
loops,  similar  as  for  free  DNA.  For  fcgT  -C  Fa  the  transition  at  t“(/o)  is  due  to  the  fact 
that  B[zm(F),uj,  F]  decreases  with  y  =  (3Fa  =  f/t  in  the  denatured  state,  due  to  the  rapid 
decay  of  zm(F )  [cf.  Eq.  (4)]  [90].  Fig.  4b  shows  the  line  fm(t)  for  self-avoiding  loops  with 
c  =  1.85  demonstrating  analogous  behaviour. 

At  very  high  forces  corrections  to  this  treatment  are  expected.  However  the  fact  that 
already  at  moderate  (in  fact,  any  positive)  external  force  F  the  value  of  the  critical  expo¬ 
nent  c  changes  indicates  that  the  force-induced  denaturation  employed  in  single  molecule 
experiments  is  physically  different  from  thermal  denaturation.  This  is  intuitively  clear  as  the 
pulling  alters  not  only  the  free  energy  of  intact  base-pairs  but  also  the  number  of  accessible 
degrees  of  freedom  of  the  polymer  loops  forming  the  denaturation  bubbles. 

We  here  treat  the  DNA  denaturation  in  presence  of  the  external  stretching  force  in  analogy 
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FIG.  5:  Clamped  DNA  domain  with  internal  bps  x  =  1  to  M,  statistical  weights  u^(x),  us t(x), 
and  tag  position  xt ■  The  DNA  sequence  enters  through  the  statistical  weights  itst(x)  and  «hb(^) 
for  disrupting  stacking  and  hydrogen  bonds  respectively.  The  bubble  breathing  process  consists  of 
the  initiation  of  a  bubble  and  the  subsequent  motion  of  the  forks  at  positions  xl  and  xr.  See  [25] 
for  details. 

to  thermal  denaturation.  The  transition,  that  is,  goes  from  the  double-stranded  state  to 
fully  denatured  single-strand.  While  this  view  is  in  accord  with  a  large  body  of  experiments 
[19,  37-39]  and  theoretical  approaches  [30,  40,  41],  one  cannot  exclude  the  possibility  that 
an  intermediate  state  of  DNA  exists,  so  called  S-DNA.  A  number  of  recent  contributions 
address  this  question  [41-46]  but  for  now  this  point  remains  unresolved. 


III.  SINGLE  DNA  BUBBLE  DYNAMICS 

Below  the  melting  temperature  Tm,  DNA  bubbles  are  intermittent,  i.e.,  they  form  spon¬ 
taneously  due  to  thermal  fluctuations  and  after  some  time  close  again.  DNA-breathing  can 
be  thought  of  as  a  biased  random  walk  in  the  phase  space  spanned  by  the  bubble  size  m 
and  its  position  denoted,  e.g.,  by  the  left  zipper  fork  position  xl  [24,  25].  The  bubble  cre¬ 
ation  can  be  viewed  as  a  nucleation  process,  whereas  the  bubble  lifetime  corresponds  to  the 
survival  time  of  the  first  passage  problem  of  relaxing  to  the  m  =  0  state  after  a  random 
walk  in  the  m  >  0  halfspace  [24,  25,  47-49].  Apart  from  NMR  techniques  [6,  11]  bubble 
breathing  could  be  measured  on  the  single  DNA-bubble  level  by  fluorescence  correlation 
spectroscopy  [14],  This  technique  employs  a  designed  stretch  of  DNA,  in  which  weaker  AT 
bps  form  the  bubble  domain,  that  is  clamped  by  stronger  GC  bonds.  In  the  bubble  domain, 
a  fluorophore-quencher  pair  is  attached,  see  Fig.  5.  Once  the  bubble  is  created,  fluorophore 
and  quencher  are  separated,  and  fluorescence  occurs. 
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A.  Continuum  approach  for  homopolymer  DNA 


Originally  bubble  breathing  was  considered  in  a  random  energy  model  with  scaling  argu¬ 
ments  and  numerical  solution  [50]  and  for  a  homopolymer  by  mapping  on  a  Fokker-Planck 
equation  for  a  random  walker  in  the  bubble  free  energy  landscape  with  approximate  analyti¬ 
cal  and  numerical  solution  [47] .  An  analytical  approach  to  bubble  breathing  in  a  homopoly¬ 
mer  DNA  with  explicit  solution  for  the  distribution  of  bubble  lifetimes  is  indeed  possible  by 
mapping  onto  the  quantum  Coulomb  problem  [51,  52]  as  we  discuss  here.  In  the  following 
subsection  we  consider  explicitly  given  DNA  sequences  in  a  discrete  approach. 

The  Poland-Scheraga  free  energy  for  a  single  bubble  has  the  continuum  form  [47,  51] 

&  —  70  +  yx  +  ckBT  In  x.  (10) 


in  terms  of  the  bubble  size  x  >  0.  Expression  (10)  corresponds  to  a  logarithmic  sink  in 
&  at  x  =  0  and  we  recognise  from  this  equation  that  a  characteristic  bubble  size  is  set  by 
X\  =  ckBT  / |q | .  We  rewrite  the  free  stacking  energy  in  terms  of  7  =  71  (Tm  —  T)/Tm  through 
the  melting  temperature  Tm,  and  similarly,  we  introduce  e  =  71/ [2 A;#]  (T”1  —  T~l). 

For  large  bubble  size  x  >  X\  the  linear  term  dominates  and  the  free  energy  grows  like 
&  ~  7o  +  jx.  For  small  bubbles  x  <  X\  [or  close  to  Tm,  where  7 (T)  m  0]  the  free  energy 
is  characterised  by  the  logarithmic  sink  but  has  strictly  speaking  a  minimum  at  &  =  70 
for  zero  bubble  size.  We  distinguish  two  temperature  ranges:  (i)  For  7  <  0,  i.e.,  T  >  Tm , 
&  has  a  maximum  ^max  =  70  +  ckBT( logaq  —  1)  at  x  —  X\.  The  free  energy  profile 
thus  defines  a  Kramers  escape  problem  in  the  sense  that  an  initial  bubble  can  grow  in 
size  corresponding  to  the  complete  denaturation  of  the  double  stranded  DNA.  The  escape 
probability  Pesc  oc  exp(— A JP/kBT),  where  the  free  energy  barrier  is  =  cksTilog x\  —  1). 
Thus 


oc 


ckBT\ 

It  I  ) 


—c 


(ii) 


has  a  power-law  dependence  on  temperature  typical  for  entropic  barriers.  In  contrast  a 
Kramers  escape  across  a  high  energetic  barrier  leads  to  an  Arrhenius  behaviour.  An  example 
for  the  latter  would  be  the  initiation  process  of  a  bubble  during  which  the  barrier  07  = 
exp(— f3Fs)  needs  to  be  crossed. 

(ii)  For  7  >  0,  i.e.,  T  <  Tm,  the  free  energy  increases  monotonically  from  &  =  y0  at 
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x  =  0  and  the  finite  size  bubbles  are  stable.  The  change  of  sign  of  7  at  T  =  Tm  thus  defines 
the  bubble  melting. 

The  gradient  of  the  free  energy  profile  then  enters  as  force  term  in  a  Langevin  equation 
for  the  bubble  size  x.  Such  a  treatment  is  possible  since  x  is  the  slow  variable  of  the  system 
compared  to  the  polymeric  degrees  of  freedom  of  a  bubble  and  even  the  entire  chain  unless 
the  chain  size  becomes  too  large.  The  Langevin  equation  can  then  be  mapped  onto  the 
Fokker-Planck  equation  for  the  probability  density  P(x,t)  to  find  a  bubble  of  size  x  at  time 


t: 


dP(x,t )  d 

-nr- =  “,Ts 


c 

LX 


+ 


(P_ 

dx2 


P(x,t). 


(12) 


Here  D  is  the  noise  strength  of  the  thermal  environment  measured  in  units  of  ksT  and  time. 
It  is  now  the  task  to  derive  from  this  dynamical  description  physically  relevant  and  measur¬ 
able  quantities.  These  are  the  bubble  lifetime  and  its  distribution  as  well  as  autocorrelation 
functions  of  the  bubble  dynamics.  We  here  concentrate  on  the  former  while  addressing  the 
autocorrelation  function  in  the  subsequent  section  dealing  with  the  discrete  formalism.  More 
details  on  the  autocorrelation  function  in  the  continuum  limit  can  be  found  in  Ref.  [51-53]. 

The  single  bubble  dynamics  can  be  analysed  in  different  ways;  namely  in  terms  of  the 
underlying  Langevin  equation  including  the  interpretation  of  the  single  bubble  dynamics 
below  the  melting  temperature  as  a  noisy  finite  time  singularity.  Alternatively  a  weak  noise 
analysis  allowing  one  to  interpret  the  dynamics  through  orbitals  in  phase  space  portraits. 
Finally,  one  may  turn  to  the  Fokker-Planck  equation  (12).  For  more  details  we  refer  to 
Refs.  [47,  51,  52], 

To  determine  the  lifetime  distribution  of  a  bubble  once  opened  we  face  a  technical  problem 
posed  by  the  cjx  term  in  the  drift  term  of  Eq.  (12).  One  way  to  circumvent  this  is  to  map 
this  Fokker-Planck  equation  onto  the  corresponding  imaginary  time  Schrodinger  equation 
of  the  quantum  Coulomb  problem  [51,  52],  From  this  formulation  one  is  able  to  deduce  the 
behaviour  of  the  bubble  lifetime.  We  distinguish  three  cases. 

(%)  Below  the  melting  temperature.  At  T  <  Tm  one  can  determine  the  density  of  the 
bubble  lifetime  distribution  analytically  in  the  long  time  limit  obtaining 


pit)  ~  4+ce|e|*°e-£2t/2;r3/2-c/2 


(13) 


Thus,  we  observe  a  power-law  behaviour  t  3/2  c/2  with  an  exponential  cutoff  at  r  =  2/e2 
such  that  the  bubble  lifetime  is  always  finite.  This  form  for  p(t)  generalises  the  expression 
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of  the  first  passage  time  density  of  a  bubble  without  entropy  loss  correction  (i.e.,  c  =  0) 
with  constant  drift  |e|  towards  bubble  closure  [47].  For  the  mean  bubble  lifetime  we  find  the 
approximate  expression 


T  = 


tp(t)dt  ~ 


Xp  K(C- i)/2(so|e|) 
N  -^(c+l)/2(^o|e|) 


(14) 


For  large  sufficiently  large  values  of  rc0|e|  the  ratio  of  the  two  Bessel  functions  tends  to 
1,  in  particular,  for  the  Gaussian  chain  limit  c  =  3/2  we  find  A'i/2(^olel)/-^3/2(a;olel)  = 
1/(1  +  |e|/x0).  This  result  for  T  includes  the  characteristic  bubble  lifetime  x0/|e|  when  the 
loop  entropy  correction  is  neglected  (c  =  0)  [47]. 

(ii)  At  the  melting  temperature.  Right  at  T  =  Tm  the  drift  exerted  by  the  free  stacking 
energy  e  vanishes,  and  the  dynamics  is  almost  free  diffusion.  The  result  for  the  density  of 
bubble  lifetimes  reads 


o  ,rl+c 

pit)  =  ^ - —  e-^/2*(2t)-3/2-c/2  (15) 

5W  r(l/2  +  c/2)  v  ;  v  ; 

and  is  normalised  and  exact  for  all  times.  In  this  case  the  power-law  t”3/2~c/2  determines 
the  long  time  behaviour.  While  for  free  diffusion  (c  =  0)  the  corresponding  mean  bubble 
lifetime  J0°°  tp(t)dt  diverges  [47],  for  all  c  >  1  we  encounter  the  mean  bubble  lifetime 

rj2 

T  =  -2-  (16) 

c  —  1 

which  interestingly  grows  like  the  square  of  the  initial  bubble  size  in  contrast  to  the  linear 
scaling  in  the  case  of  diffusion  with  linear  drift  in  case  (i).  In  addition  to  the  finite  mean 
bubble  lifetime  a  value  c  >  2  would  also  cause  a  power- law  decay  C(t)  ~  of  the 

associated  correlation  function  at  long  times  in  contrast  to  the  plateau  C{t)  ~  1  reached  for 
1  <  c  <  2  [53], 

(in)  Above  the  melting  temperature.  At  T  >  Tm  the  situation  is  opposite  to  case  (i); 
namely  the  drift  is  now  directed  towards  the  complete  denaturation  of  the  chain.  In  a  long 
chain  the  one  bubble  picture  would  no  longer  hold  and  bubble  coalescence  needs  to  be  taken 
into  account.  However  in  shorter  DNA  constructs  preferring  one  single  bubble  the  density 
of  bubble  lifetimes  would  decay  exponentially  [51,  52], 


B.  Discrete  approach  and  sequence  dependence 

The  natural  coordinate  for  the  unzipping  and  zipping  of  base  pairs  in  DNA  breathing 
dynamics  is  the  location  x  of  a  respective  base  pair  along  the  chemical  backbone  of  the  DNA 
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molecule.  By  its  very  nature  this  is  a  discrete  variable.  While  in  the  continuum  approach 
one  may  include  certain  given  distributions  of  more  and  less  stable  regions  (predominantly 
GC-rich  versus  predominantly  AT-rich)  the  use  of  a  truly  discrete  x  allows  one  to  consider 
any  given  sequence.  This  is  of  particular  importance  when  analysing  actual  biologically 
relevant  sequences  or  those  designed  sequences  that  are  used  in  a  given  experiment.  Such 
a  discrete  approach  in  terms  of  the  master  equation  will  be  described  here.  We  note  that 
a  disadvantage  of  this  method  is  the  limited  system  size  one  can  de  facto  analyse  due  to 
computational  constraints. 

With  a  discrete  coordinate  we  are  also  able  to  explicitly  distinguish  hydrogen  bonding 
and  stacking  energies  and  use  the  parameters  for  the  free  energies  from  Krueger  et  ah  [6]. 
For  the  setup  sketched  in  Fig.  5  we  then  find  the  partition  function.  The  positions  Xl  and 
Xr  of  the  zipper  forks  correspond  to  the  right-  and  leftmost  closed  bp  of  the  bubble.  Xl 
and  Xr  are  stochastic  variables,  whose  time  evolution  in  the  energy  landscape  defined  by 
the  partition  factor  (m  >  1) 

XL+m  XL+m+1 

&{xL,m)  =  .  n  «»(*)  n  «-<*)  (i7) 

'  '  X=Xl  +  1  X=Xl  +  1 

characterises  the  bubble  dynamics.  3?  is  written  in  terms  of  Xr  and  bubble  size  m  = 
Xr  —  Xr  —  1,  with  3? (m  =  0)  =  1.  Here,  £'  =  2C£,  where  £  ~  10~3  is  the  ring  factor  for 
bubble  initiation  from  Ref.  [6]  that  is  related  to  the  cooperativity  parameter  ctq  ps  10-5  [7,  54] 
by  (To  =  £  exp(est)  [6].  For  the  entropy  loss  on  forming  a  closed  polymer  loop  we  assign  the 
factor  (1  +  m)~c  [54,  55]  and  take  c  =  1.76  for  the  critical  exponent  [33].  This  corresponds 
to  the  Flory  form  ?>v  for  the  entropy  loss  factor  for  a  polymer  ring  with  excluding  volume. 
The  best  known  value  for  v  is  0.588  [56-58].  Note  that  there  exist  alternative  models  taking 
into  account  the  self-avoiding  interactions  of  the  bubble  with  the  rest  of  the  chain,  leading 
to  an  increased  value  for  c  (c  ~  2.1)  such  that  the  denaturation  transition  becomes  first 
order  [31,  33].  Note  also  that  a  bubble  with  m  open  bps  requires  breaking  of  m  hydrogen 
bonds  and  m  +  1  stacking  interactions. 

The  zipper  forks  move  stepwise  xr/r  — >  xr/r  ±  1  with  rates  t r/r(xr/r,  mn).  We  define  for 
bubble  size  decrease 

t£(xL,m)  =  t^(xL,m)  =  k/2  (m  >  2)  (18) 

for  the  two  forks  [91].  The  rate  k  characterises  a  single  bp  zipping.  Its  independence  of  x 
corresponds  to  the  view  that  bp  closure  requires  the  diffusional  encounter  of  the  two  bases 
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and  bond  formation;  as  sterically  AT  and  GC  bps  are  very  similar,  k  should  not  significantly 
vary  with  bp  stacking,  k  is  the  only  adjustable  parameter  of  our  model,  and  has  to  be 
determined  from  experiment  or  future  MD  simulations.  The  factor  1/2  is  introduced  for 
consistency  [48,  49].  Bubble  size  increase  is  controlled  by 

t  l(xL,m)  =  kust(xL)uhh(xL)s(m)/2, 

t  +(xL,m)  =  kust(xR  +  l)uhh(xR)s(m)/2,  (19) 

for  m  >  1,  where  s(m)  =  {(1  +  m)/(2  +  m)}c.  Finally,  bubble  initiation  and  annihilation 
from  and  to  the  zero-bubble  ground  state,  m  —  0^1  occur  with  rates 

t  g(xl)  =  k^s(0)ust(xL  +  l)«hb(^L  +  l)ust(xL  +  2) 
t g(xl)  =  k.  (20) 

The  rates  t  fulfil  detailed  balance  conditions.  The  annihilation  rate  t q(xl)  is  twice  the 
zipping  rate  of  a  single  fork,  since  the  last  open  bp  can  close  either  from  the  left  or  right. 
Due  to  the  clamping,  xl  >  0  and  xR  <  M  +  1,  ensured  by  reflecting  conditions  t/(0,  m)  = 
t r(xl,  M  —  xR)  —  0.  The  rates  t  together  with  the  boundary  conditions  fully  determine  the 
bubble  dynamics. 

In  the  FCS  experiment  fluorescence  occurs  if  the  bps  in  a  A-neighbourhood  of  the  flu- 
orophore  position  xR  are  open  [14].  Measured  fluorescence  time  series  thus  correspond  to 
the  stochastic  variable  I(t),  that  takes  the  value  1  if  at  least  all  bps  in  \xR  —  A,  xR  +  A]  are 
open,  else  it  is  0.  The  time  averaged  (“)  fluorescence  autocorrelation 

At{xT,t)  =  I(t)I(  0)-m2  (21) 

for  the  sequence  AT9  from  [14]  are  rescaled  in  Fig.  6. 

DNA  breathing  is  described  by  the  probability  distribution  P(xL,m,t )  to  find  a  bubble 
of  size  m  located  at  xl  whose  time  evolution  follows  the  master  equation 

ap(^m.«)=w ,p(XLmt)  (22) 

The  transfer  matrix  W  incorporates  the  rates  t.  Detailed  balance  guarantees  equilibration 
toward 

Peq  =  lim  P(xL,m,t )  =  m - ,  (23) 

i— »oo 
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FIG.  6:  Scaling  plot  of  At(xT,t )  at  various  T  for  the  sequence  AT9  from  [14]  as  indicated  in  the 
figure.  This  experimental  construct  is  designed  with  a  weak  AT-rich  bubble  domain  in  the  core, 
a  GC  clamp  at  both  ends  and  additional  bulge  loop  of  DNA  single  strand  consisting  of  four  T 
bases.  The  symbols  represent  experimental  data  at  various  temperatures,  see  Refs.  [24,  25]  for 
more  details.  We  also  include  results  from  our  master  equation  model.  Inset:  Relaxation  time 
spectrum.  See  text  for  more  details. 

with  =  J2xl  m  [48,  49,  59].  The  master  equation  and  the  explicit  construction 

of  W  are  discussed  at  length  in  Refs.  [25,  48,  49,  60].  Eigenmode  analysis  and  matrix  diago- 
nalisation  produces  all  quantities  of  interest  such  as  the  ensemble  averaged  autocorrelation 
function 

A(XT,t)  =  (i(t)im  -  m2-  (24) 

(/(f)/(0))  is  proportional  to  the  survival  density  that  the  bp  is  open  at  t  and  that  it  was 
open  initially  [24,  60]. 

In  Fig.  6  the  blue  curve  shows  the  predicted  behaviour  of  A(xt,  t),  calculated  for  T  =  49°C 
with  the  parameters  from  [6].  As  in  the  experiment  we  assumed  that  fluorophore  and 
quencher  attach  to  bps  xt  and  xt  + 1,  that  both  are  required  open  to  produce  a  fluorescence 
signal.  From  the  scaling  plot,  we  calibrate  the  zipping  rate  as  k  =  7.1  x  104/s,  in  good 
agreement  with  the  findings  from  Ref.  [14].  The  calculated  behaviour  reproduces  the  data 
within  the  error  bars,  while  the  model  prediction  at  T  =  35°C  shows  more  pronounced 
deviation.  Potential  causes  are  destabilising  effects  of  the  fluorophore  and  quencher,  and 
additional  modes  that  broaden  the  decay  of  the  autocorrelation.  The  latter  is  underlined 
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by  the  fact  that  for  lower  temperatures  the  relaxation  time  distribution  /(r),  defined  by 
A(xT,t)  =  f  exp (— f/r)/(r)dr,  becomes  narrower  (Fig.  6  inset).  Deviations  may  also  be 
associated  with  the  correction  for  diffusional  motion  of  the  DNA  construct,  measured  without 
quencher  and  neglecting  contributions  from  internal  dynamics  [61].  Indeed,  the  black  curve 
shown  in  Fig.  6  was  obtained  by  a  3%  reduction  of  the  diffusion  time  [92]  which  should 
roughly  account  for  the  presence  of  the  quencher. 

Stochastic  simulation.  Based  on  the  rates  t,  stochastic  simulations  give  access  to  single 
bubble  fluctuations  [62],  The  corresponding  Gillespie  algorithm  uses  the  joint  probability 
density  of  waiting  time  r  and  path  /a  =  +/— , 

P(t,h,v)  =  t^(xL,  m)  exp  (  -r^t^(sL,m)  J  ,  (25) 

defining  for  given  state  (x£,m)  after  what  time  r  the  next  step  of  fork  v  G  {L,R}  occurs. 
The  formulation  via  the  waiting  time  density  u  P  is  economical  computationally,  avoiding 
a  large  number  of  unsuccessful  opening  attempts  in  traditional  Langevin  simulations.  Using 
(25)  we  obtain  the  single  bubble  time  series  in  Fig.  7  for  two  different  tag  positions  in  the 
T7  bacteriovirus  promoter  sequence 

1  20 

I  I 

5 ’ -ATGACCAGTTGAAGGACTGGAAGTAATACGACTC 

(26) 

AGTATAGGGACAATGCTTAAGGTCGCTCTCTAGGAG-3’ 

I  I  I 

38  41  68 

whose  TATA  motif  is  underlined  [23].  A  promoter  is  a  sequence  (often  containing  the  so 
called  TATA  motif)  placed  at  the  start  of  a  gene,  to  which  RNA  polymerase  is  then  recruited 
to  initiate  transcription  [63].  Motives  such  as  TATA  are  believed  to  assist  polymerase  during 
the  transcription  initiation  [22,  25].  Fig.  7  shows  the  signal  /(f)  at  37°C  for  the  tag  positions 
xt  =  38  in  the  core  of  TATA,  and  xt  =  41  at  the  second  GC  bp  after  TATA.  Bubble  events 
occur  much  more  frequently  in  TATA  (the  TA/AT  stacking  interaction  is  particularly  weak 
[6]).  This  is  quantified  by  the  density  of  waiting  times  i[)( r )  spent  in  the  1  =  0  state,  whose 
characteristic  time  scale  r'  =  J0°°  drr^ir)  is  more  than  an  order  of  magnitude  longer  than 
at  Xt  =  41.  In  contrast,  we  observe  similar  behaviour  for  the  density  of  opening  times 
</>(r)  for  Xt  =  38  and  41.  The  solid  lines  are  the  results  from  the  master  equation  showing 
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FIG.  7:  Top:  Time  series  I(t)  for  the  T7  promoter,  for  the  opening  of  base  pairs  at  labels  xt  = 
38  (in  the  TATA  motif)  and  41  (in  the  adjacent  GC  region).  Middle:  Fluorescence  time  </>(r) 
corresponding  to  the  bubble  life  time  and  waiting  time  ^(r)  elapsing  between  bubble  events.  While 
the  bubble  lifetime  in  both  regions  of  the  sequence  are  approximately  equivalent,  the  occurrence 
frequency  of  bubbles  is  indeed  significantly  higher  within  the  TATA  domain.  Bottom:  Mean 
fluorescence  time  for  A  =  0  for  parameter  sets  from  Blake  et  al.  [54]  and  Krueger  et  al.  [6]. 
One  recognises  the  much  stronger  sequence  sensitivity  for  the  parameters  from  Krueger  et  al.  The 
shaded  area  corresponds  to  the  TATA  domain.  Again  the  lifetime  does  not  appear  to  significantly 
distinguish  the  TATA  domain.  In  contrast  the  simultaneous  opening  of  4  sequential  base  pairs 
clearly  favours  opening  of  the  motif  [24,  25]. 

excellent  agreement  with  the  results  from  the  Gillespie  stochastic  simulation.  Notice  that 
whereas  ^( t )  is  characterised  by  a  single  exponential,  (p{t )  show  a  crossover  between  different 
regimes.  For  long  times  both  r )  and  0(r)  decay  exponentially  as  it  should  for  a  finite 
DNA  stretch. 

C.  Bubbles  in  biological  sequences 

After  presenting  our  results  for  the  T7  promoter  sequence  above  in  this  section  we  com¬ 
ment  on  the  biological  relevance  of  the  distribution  of  soft  and  hard  zones,  in  particular  with 
respect  to  transcription  initiation.  A  more  detailed  analysis  can  be  found  in  [22-25]. 

Let  us  start  by  briefly  commenting  on  the  biochemical  relevance  of  the  TATA  box  motif 
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(also  referred  to  as  Goldberg- Hogness  box).  It  is  a  DNA  sequence  (cis-regulatory  element) 
found  in  the  promoter  region  of  most  genes  in  eukaryotes  and  a  group  of  single-celled  mi¬ 
croorganisms  called  archaea.  Similar  binding  motifs  with  similar  properties  exist  in  other 
organisms.  The  TATA  box  is  the  binding  site  of  transcription  factors  and  is  involved  in 
the  process  of  transcription  by  RNA  polymerase.  Its  core  sequence  is  B’-TATAAA-S’  or  a 
variant,  usually  followed  by  three  or  more  adenine  bases.  Commonly  it  is  located  25  base 
pairs  upstream  to  the  transcription  site.  The  TATA  box  is  normally  bound  by  the  TATA 
Binding  Protein  (TBP)  during  transcription.  The  TBP  unwinds  the  DNA  and  strongly 
bends  it.  At  a  later  stage  the  TATA  box  is  bound  by  RNA  polymerase  and  transcription 
commences.  [93]  The  high  proneness  towards  bubble  formation  at  the  TATA  box  is  therefore 
believed  to  actively  contribute  to  transcription  initiation.  [94] 

1.  Bacteriophage  T 7  core  promoter. 

Its  sequence  is  displayed  in  Eq.  (26).  It  contains  the  TATA  box  at  base  pair  labels  36  to 
39.  Fig.  8  shows  the  equilibrium  probabilities  for  the  base  pairs  to  be  open.  In  this  example 
the  TATA  box  is  located  right  next  to  the  transcription  start  site.  From  the  graph  one 
can  see  that  indeed  the  simultaneous  opening  probability  of  four  base  pairs  is  significantly 
increased  at  the  position  of  the  TATA  box.  Note  the  level  of  the  opening  probability  of 
a  random  sequence  also  drawn  in  the  figure.  Accordingly  several  domains  of  significantly 
increased  bubble  probability  exist  along  this  sequence. 

2.  Adenovirus  Major  Late  Promoter. 

Its  86  base  pair  sequence 


1  31 

I  I 

5 ’ -  GCCACGTGACCAGGGGTCCCCGCCGGGGGGGT 

ATAAAAGGGGCGGACCTCTGTTCGTCCTC  (27) 

ACTGTCTTCCGGATCGCTGTCCAG  -3 ’ 

I  I 

TSS  84 
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FIG.  8:  Equilibrium  opening  probability  of  base  pairs  in  the  sequence  of  the  bacteriophage  T7  core 
promoter. 

contains  a  transcription  start  site  at  the  position  labelled  TSS,  compare  Fig.  9.  In  this 
example  the  TATA  box  is  located  upstream  at  the  base  pair  label  -29.  In  this  example  the 
TATA  box  is  extremely  more  likely  to  simultaneously  open  than  any  other  domain  along 
the  sequence. 

3.  Adeno  Associated  Viral  P5  promoter. 

This  sequence  consists  of  the  69  base  pairs 

1  25 

I  I 

5  ’ -  GTGGCCATTTAGGGTATATATGGCCGAGTGAGCGA  ^  9g ^ 

GCAGGATCTCCATTTTGACCGCGAAATTTGAACG  -3 ’ 

I  I 

TSS  67 

and  supports  binding  of  TBP  at  the  TATA  box  as  well  as  the  binding  of  the  Yin  Yang  1 
(YY1)  transcription  factor.  YY1  is  known  to  interact  with  the  TBP  [64],  YY1  binds  to 
a  specific  sequence  clement  of  the  form  CCATNTT  marked  blue  in  the  sequence.  As  can  be 
seen  from  Fig.  10  these  two  binding  motifs  have  a  significantly  higher  cooperative  opening 
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FIG.  9:  Equilibrium  opening  probability  of  base  pairs  in  the  sequence  of  the  Adenovirus  Major 
Late  Promoter. 


FIG.  10:  Equilibrium  opening  probability  of  base  pairs  in  the  sequence  of  the  Adeno  Associated 
Viral  P5  promoter. 

probability  than  any  other  sequence  clement  of  this  promoter.  The  analysis  also  shows  a 
broader  but  lower  peak  around  the  transcription  start  site. 

In  summary  this  analysis  shows  that  indeed  local  instability  of  the  DNA  sequence  ap- 
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pears  to  occur  at  specific  binding  sequences  for  proteins  involved  in  transcription  initiation. 
Whether  it  is  just  the  lower  free  energy  needed  to  break  these  sequences  or  indeed  rare 
bubble  openings  at  these  site  that  help  the  protein  binding  remains  an  open  question. 

IV.  DNA  BUBBLE  COALESCENCE 

It  has  been  shown  in  a  quantitative  analysis  that  the  experimentally  accessible  autocorre¬ 
lation  function  is  sensitive  to  the  stacking  parameters  of  DNA  [24,  25] .  However,  it  has  not 
been  fully  appreciated  to  what  extent  the  fluorophore  and  quencher  molecules,  that  are  at¬ 
tached  to  the  DNA  construct  in  the  experiments  reported  in  Refs.  [14,  61,  65],  influence  the 
stability  of  DNA.  Moreover,  the  zipping  rates  measured  in  the  single  molecule  fluorescence 
setup  differ  from  those  determined  in  NMR  experiments  [11,  14].  We  here  propose  and  study 
an  alternative  setup  for  the  single  molecule  fluorescence  investigation  of  DNA  breathing  as 
shown  in  Fig.  11  that  may  improve  and  complement  the  single  molecule  data  obtained  from 
a  DNA  construct  with  a  single  bubble  domain.  [95]  In  this  setup,  a  short  stretch  of  DNA, 
clamped  at  both  ends,  is  designed  such  that  two  soft  zones  consisting  of  weaker  AT-bps  are 
separated  by  a  more  stable  barrier  region  rich  in  GC  bps.  For  simplicity,  we  assume  that 
both  soft  zones  and  barrier  are  homopolymers  with  a  bp-dissociation  free  energy  A Gs  and 
A Gb,  respectively,  and,  in  accordance  with  the  experimental  findings  of  reference  [14],  we 
neglect  secondary  structure  formation  in  the  barrier  zone.  At  temperatures  higher  than  the 
melting  temperature  Ts  of  the  soft  zones  but  still  lower  than  the  melting  temperature  Tj J,  of 
the  barrier  region,  thermal  fluctuations  will  gradually  dissociate  the  barrier,  until  the  two 
bubbles  coalesce.  Once  coalesced,  the  release  of  the  free  energy  corresponding  to  one  cooper- 
ativity  factor  <to  ~  10-5'"-3  is  released,  stabilising  the  coalesced  bubble  against  reclosure  of 
the  barrier.  Moreover  there  exists  a  significant  dynamic  barrier  stemming  from  the  necessity 
of  diffusional  encounter  of  the  two  bases  in  order  to  reanneal  the  barrier.  Both  points  lead  to 
a  long  lifetime  of  the  coalesced  state.  This  fact  should  allow  for  a  meaningful  measurement 
of  the  coalescence  time  in  experiment,  and  therefore  provide  a  new  and  sensitive  method  to 
measure  DNA  stability  data  and  base  pair  zipping  rates.  We  also  study  the  case  when  the 
system  is  prepared  as  above  and  then  T  suddenly  increased  such  that  T  >  Tf,  >  Ts  so  that 
the  system  is  driven  towards  coalescence.  In  both  cases  the  two  boundaries  between  bubbles 
and  barrier  perform  a  (biased)  random  walk  in  opposite  free  energy  potentials. 
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FIG.  11:  Schematic  of  the  DNA  construct  for  bubble  coalescence.  Note  that  the  position  of  both 
ends  of  the  barrier  region  are  measured  from  the  same  point  (the  position  of  the  leftmost  barrier 
base  pair). 


The  statistical  weight  of  the  construct  before  coalescence, 

3Tx,y  =  ^eNL/3£')e(x-Y+N)(3£^eNRf}£'),  (29) 

at  Tb  >  T  >  Ts  involves  the  cooperativity  factor  £  ps  1CD5  for  each  bubble,  and  a  Boltzmann 
factor  for  each  broken  bp  with  free  energies  e'  >  0  and  e  <  0,  compare  reference  [24],  Upon 
coalescence,  the  boundary  free  energy  corresponding  to  one  factor  £  is  released, 

^coal  =  ^Nl+Nr)I3£'+N^£,  (30) 


stabilising  the  system  against  immediate  transition  back  to  a  two-bubble  state.  It  is  this 
distinctive  feature  that  should  render  this  setup  an  interesting  model  system  for  single 
molecule  analyses  of  DNA  denaturation  dynamics  as  the  coalesced  state  can  be  determined 
by  measuring  first  passage  time  statistics  (corresponding  to  the  introduction  of  an  absorbing 
boundary  condition  at  the  point  of  coalescence). 

In  our  analysis  we  use  a  continuum  approach  to  the  stochastic  motion  of  the  two  zipping 
forks  at  either  end  of  the  barrier  zone  with  locations  x  and  y.  The  probability  density 
P(x,  y,  t )  then  follows  the  bivariate  Fokker-Planck  equation  [66] 


d_ 

dt 


P(x,y,t )  = 


d2  d2 

+ 


dx 2  dy 2 


-21Tx  +  vh]nx'y't)' 


(31) 


with  the  dimensionless  force  /  =  N{u  —  1)/(1  +  u)  and  time  rescaled  by  k(  1  +  u)/2N2. 
Equation  (31)  is  completed  by  the  initial  condition  P(x,y,  0)  =  S(x  —  xo)5(y  —  yo)  and  the 
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FIG.  12:  Trajectories  of  the  random  motion  of  the  two  bubble  forks. 


reflecting  boundary  conditions  (the  bubbles  in  the  soft  zones  are  assumed  to  be  open  at  all 
times) 


(32) 


0. 


x=0 


y= i 


Moreover,  we  impose  the  absorbing  boundary  condition  P(x,x,t)  =  0.  This  defines  the 
vicious  walker  property  [67],  terminating  the  process  when  the  two  walkers  meet.  The  fact 
that  the  two  walker  move  in  opposite  potentials  actually  make  this  problem  a  previously 
unsolved  case  of  vicious  walkers  models  [66]. 

Typical  examples  of  individual  trajectories  resulting  from  a  Gillespie  algorithm  are  dis¬ 
played  in  figure  12,  where  traces  of  the  two  interfaces  (forks)  cornering  the  barrier  region 
are  shown.  Bubble  coalescence  terminates  each  pair  of  trajectories. 

The  analysis  in  Ref.  [66]  reveals  the  distribution  of  coalescence  positions  (he.,  where  the 
two  zipper  forks  eventually  meet)  and  the  coalescence  times,  as  shown  in  Fig.  13.  The  curves 
for  the  PDF  p(x)  of  the  coalescence  position  exhibit  a  pronounced  crossover  from  a  relatively 
sharply  peaked  form  to  an  almost  flat  behaviour.  The  former  occurs  for  large  positive  force 
/,  corresponding  to  a  strong  drift  toward  a  potential  well,  with  negligible  influence  of  the 
boundary  conditions.  In  contrast,  for  large  negative  /,  corresponding  to  a  high  barrier  for 
coalescence,  the  insensitivity  of  p(x)  to  the  position  x  can  be  explained  in  terms  of  a  simple 
Arrhenius  argument:  The  probability  of  the  walker  to  be  at  a  position  x  is  proportional 
to  the  Boltzmann  weight,  exp(— /3(p(x)),  where  (p(x )  =  —  f  d  F(x')dx'  is  the  free  energy 
corresponding  to  the  force  F(x).  Then,  the  joint  probability  to  have  both  walkers  meet  at  the 
same  position  is  given  by  the  product  exp(— /3[(Pl{x)+4>r(x)})  ~  const,  as  the  two  walkers  are 
in  opposite  linear  potentials  and  the  position  dependence  of  the  exponent  cancels  out.  This 
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FIG.  13:  Left:  Distribution  of  coalescence  positions  within  the  rescaled  barrier  zone  [0, 1].  Right: 
Distribution  of  coalescence  times. 

simple  picture  necessarily  breaks  down  close  to  the  boundaries,  (ii)  The  /-dependence  of  the 
mean  first  passage  time  r  crosses  over  from  the  r  ~  1//  behaviour  typical  for  diffusion  in  a 
strong  positive  force  pushing  the  two  walkers  together,  to  the  exponential  form  r  ~  exp(2|/|) 
of  the  associated  Kramers  problem.  The  former  problem  was  studied  in  reference  [47]  by 
neglecting  the  boundaries  and  switching  to  the  relative  coordinate  description  which  enables 
one  to  find  the  analytic  result  r  =  1/(4/).  For  the  Kramers  problem  (/  -C  —1)  the  analytic 
solution  for  both  p(x)  =  [l  —  e-2^^- e-2^(1-x)]|/|/(|/|  —  1)  and  r  =  e2^l/[16/2(|/|  —  1)]  can 
be  found  rather  easily  [68]  by  the  expansion  into  the  lowest  two  eigenmodes  of  p(x,t\xo). 

V.  COUPLED  DYNAMICS  OF  DNA  BUBBLES  AND  SELECTIVELY  SINGLE¬ 
STRAND  DNA  BINDING  PROTEINS 

A  traditional  puzzle  had  been  the  question  why  the  presence  of  selectively  single-strand 
DNA  binding  proteins  (SSBs)  does  not  lead  to  full  denaturation  of  the  DNA  [1],  While  ideas 
about  a  kinetic  block  were  brought  forth  relatively  early  [69-71],  experimentally  this  puzzle 
could  only  be  solved  by  single  molecule  methods  in  which  the  denaturation  was  not  induced 
by  temperature  but  force.  In  a  series  of  experiments  the  binding  and  unbinding  kinetics 
of  SSBs  and  their  mutants  to  DNA  denaturation  bubbles  and  the  resulting  effect  on  the 
denaturation  force  were  studied  in  great  detail  [19,  72,  73].  Here  we  discuss  a  simple  model 
for  the  SSB-DNA  interaction  in  a  homopolymer  approach  by  a  master  equation  approach 
[48,  49], 

The  quantity  of  interest  is  the  joint  probability  P(?n,n,  t)  to  have  a  bubble  consisting  of 
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FIG.  14:  Effective  free  energy  of  the  SSB-DNA  bubble  interaction  in  the  limit  7  >  1  ( — ),  and 
free  energy  landscape  for  various  fixed  n  (u  =  0.6,  M  =  40,  c  =  1.76,  A  =  5).  Left:  k  =  0.5; 
Right:  stronger  binding,  k  =  1.5.  In  the  latter  case  the  binding  strength  of  the  SSB  suffices  to 
cause  a  decreasing  effective  free  energy  and  therefore  induce  full  denaturation  of  the  DNA.  Due  to 
the  finite  size  effects  the  nucleation  barrier  for  initiation  of  SSB  exchange  has  to  be  crossed. 

m  broken  bps,  and  n  SSBs  bound  to  the  two  arches  of  the  bubble.  In  addition  to  the  rates 
for  bubble  increase  and  decrease,  the  rates  for  SSB  binding  and  unbinding  are  necessary  to 
define  the  breathing  dynamics  in  the  presence  of  SSBs.  On  the  statistical  level,  the  effect 
of  the  SSBs  becomes  coupled  to  the  motion  of  the  zipper  forks.  Thus,  the  rate  for  bubble 
size  decrease  is  proportional  to  the  probability  that  no  SSB  is  located  right  next  to  the 
corresponding  zipper  fork;  and  the  rate  for  SSB  binding  is  proportional  to  the  probability 
that  there  is  sufficient  unoccupied  space  on  the  bubble.  Binding  is  allowed  to  be  asymmetric, 
and  is  related  to  a  parking  lot  problem  in  the  following  sense.  The  number  A  of  bases  occupied 
by  a  bound  SSB  is  usually  (considerably)  larger  than  one.  In  order  to  be  able  to  bind  in 
between  two  already  bound  SSBs,  the  distance  between  these  two  SSBs  must  be  larger  than 
A.  The  larger  A  the  less  efficient  the  SSB-binding  becomes,  similar  to  parking  large  cars 
on  a  parking  lot  designed  for  small  vehicles.  Apart  from  the  binding  size  A  of  the  SSBs, 
two  additional  physical  parameters  come  into  play:  the  unbinding  rate  q  of  the  SSBs,  and 
their  binding  strength  hi  =  Co/l eq  consisting  of  the  volume  concentration  Cq  of  SSBs  and  the 
equilibrium  binding  constant  Keq  =  v0  exp  (/^AssbI),  with  the  typical  SSB  volume  v0  and 
binding  energy  Essb- 

The  coupled  dynamics  of  SSB-binding  and  bubble  breathing  is  discussed  in  references 
[48,  49];  similar  effects  in  end-denaturing  DNA  were  studied  in  [65]  in  detail.  Here,  we 
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report  the  behaviour  of  the  effective  free  energy  landscape  in  the  limit  of  fast  SSB-binding 
in  the  sense  that  the  dimensionless  parameter  7  =  q/k  of  SSB- unbinding  and  bubble  zipping 
rates  is  large,  7>1.  This  limit  allows  one  to  average  out  the  SSB-dynamics  and  to  calculate 
an  effective  free  energy,  in  which  the  bubble  dynamics  with  the  slow  variable  m  runs  off.  The 
result  for  two  different  binding  strengths  k  is  shown  in  figure  14,  along  with  the  free  energies 
corresponding  to  keeping  n  fixed.  It  is  distinct  that  while  for  lower  k  the  presence  of  SSBs 
diminishes  the  slope  of  the  effective  free  energy,  for  larger  k  the  slope  actually  becomes 
negative.  In  the  first  case,  that  is,  the  bubble  opening  is  more  likely,  but  still  globally 
unfavourable.  In  the  latter  case,  the  presence  of  SSBs  indeed  leads  to  full  denaturation.  One 
observes  distinct  finite  size  effects  dne  to  A  >  1:  only  when  the  bubble  reaches  a  minimal 
size  m  >  A,  SSB-binding  may  occur,  a  second  SSB  is  allowed  to  bind  to  the  same  arch  only 
once  m  >  2A,  etc.  This  effect  also  produces  the  nucleation  barrier  for  full  denaturation 
in  the  right  plot  of  figure  14.  Similar  finite  size  effects  were  investigated  for  biopolymer 
translocation  in  references  [74,  75].  We  note  that  the  transition  to  denaturation  could  also 
be  achieved  by  reaching  a  smaller  positive  slope  of  the  effective  free  energy  in  the  presence 
of  SSBs,  and  additional  titration  or  change  of  the  effective  temperature  through  actual 
temperature  change  or  mechanical  stretching  as  performed  in  the  experiments  reported  in 
references  [19,  72,  73]. 

VI.  CONCLUDING  REMARKS 

DNA  possesses  a  number  of  properties  that  render  it  a  very  attractive  model  system.  Thus 
the  study  of  the  DNA  denaturation  transition  has  occupied  statistical  physics  for  around 
five  decades.  DNA  is  comparatively  thin  and  stiff  locally  while  its  overall  length  is  fully 
macroscopic.  Thus  it  is  probably  the  closest  available  example  for  testing  the  predictions 
from  polymer  physics.  In  particular  single  DNA  can  be  probed  and  manipulated  and  its 
interactions  with  binding  proteins  and  chemicals  investigated.  This  includes  the  monitoring 
of  single  DNA  bubbles  and  their  interaction  with  specifically  single  stranded  DNA  binding 
proteins,  both  described  here,  as  well  as  the  interaction  of  DNA  with  intercalators  [37].  By 
now  single  molecule  assays  can  also  be  used  to  study  the  search  mechanisms  of  DNA  binding 
proteins  scanning  it  for  specific  binding  sites  relevant  in  gene  regulation  and  DNA  repair 
[20,  76-78].  This  attractiveness  of  DNA  combines  with  its  ultimate  role  as  the  molecule  of 
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life  and  therefore  is  one  of  the  finest  examples  where  the  interests  of  biological  physics  meet 
those  of  biochemists  and  molecular  biologists. 

The  label  century  of  biology  is  frequently  bestowed  upon  the  21st  (e.g.,  Ref.  [79]).  In  the 
wash  of  the  success  of  biology,  molecular  and  systems  biology  in  particular,  one  experiences 
a  mushrooming  number  of  works  in  the  biological  physics  sector.  Indeed  many  of  these 
problems  pose  very  attractive  and  new  questions  to  physicists  and  along  with  the  availability 
of  single  molecule  techniques  prompt  new  advances,  for  instance  in  statistical  physics. 

A  prime  example  for  the  challenges  ahead  is  the  current  lack  of  understanding  of  bio¬ 
chemical  processes  in  living  cells  under  conditions  of  molecular  crowding  [80-82],  It  is 
being  realised  that  knowledge  obtained  under  dilute  conditions  in  vitro  does  not  necessarily 
translate  to  the  situation  in  vivo  and  this  point  will  need  considerable  more  quantitative 
investigation.  As  it  stands  the  input  from  biological  physics  will  be  crucial,  for  example 
regarding  diffusive  processes.  It  appears  that  subdiffusion  of  biopolymers  occurs  in  condi¬ 
tions  of  molecular  crowding  [83-85]  this  being  the  likely  source  for  strong  scatter  of  time 
averages  and  apparent  diffusivities  of  single  trajectories  [83,  86,  87]  requiring  great  care  in 
the  quantitative  analysis  [88]. 
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